Working with geographical data using DataFrames.jl¶

Minisimposium for tabular data, creation of this tutorial has been supported by the Polish National Agency for Academic Exchange under the Strategic Partnerships programme, grant number BPI/PST/2021/1/00069/U/00001

In [20]:
using OSMToolset  # Pkg.add(url="https://github.com/pszufe/OSMToolset.jl", rev="main")
using DataFrames
using PyCall
using OpenStreetMapX
using Colors
using Plots
using SimpleValueGraphs
using OpenStreetMapXPlot

flm = pyimport("folium");

The boston.osm file was downloaded form [https://overpass-api.de/api/map?bbox=-71.1098,42.3521,-71.0502,42.3805]

You can used a smaller inbuilt map.osm file instead too (uncomment code)

In [21]:
#file = joinpath(dirname(pathof(OSMToolset)),"..","test","data","map.osm")
file = joinpath(dirname(pathof(OSMToolset)),"..","boston.osm")
tt = @elapsed df = find_poi(file)
map_data = get_map_data(file, use_cache=false, trim_to_connected_graph=true );

println("OSM file parsing speed: $(round(filesize(file)/(1024*1024)/tt,digits=1)) MB/s")
OSM file parsing speed: 11.4 MB/s

Attractiveness Spatial Index (ix) is built on the base of a DataFrame. Attractiveness of a location is evaluated by providing its coordinates.

In [34]:
ix = AttractivenessSpatIndex(df);

# Stata Center campus
lat=42.361732327506516
lon=-71.09043164955334

attract = attractiveness(ix, lat, lon)
Out[34]:
(education = 364.07988461350504, parking = 1.0575223826966949, shopping = 16.62025527806438, transport = 7.1895280000006405, healthcare = 3.477632654568438, entertainment = 47.89522092353563, leisure = 290.5967600088863, restaurants = 113.55682405754388)

explain=true makes it possible to understand how the value was built

In [23]:
attract = attractiveness(ix, lat, lon, explain=true)
attract[2]
Out[23]:
283×5 DataFrame
258 rows omitted
Rowclasspointspoidistancelatlon
SymbolInt64Float64Float64Float64
1education201152.6842.3518-71.0864
2education51268.242.3522-71.082
3education202297.6742.3522-71.0657
4leisure51063.1942.3531-71.0849
5leisure51067.7642.3531-71.0846
6education201725.1542.3533-71.108
7education51427.1842.3535-71.0771
8education202475.8542.3538-71.0623
9leisure51450.2542.3543-71.1049
10education202936.642.3545-71.0562
11education51394.842.3546-71.0765
12leisure51195.0242.3547-71.0795
13leisure51272.542.3548-71.1027
⋮⋮⋮⋮⋮⋮
272leisure51302.9342.3726-71.0845
273leisure51383.6742.3733-71.0841
274leisure51474.0942.3733-71.0816
275leisure51406.3642.3735-71.084
276education52326.5242.3751-71.1122
277education31762.1342.3774-71.0937
278education52728.0942.3778-71.0654
279education52798.4942.3783-71.0648
280education51968.0742.3784-71.0985
281education202419.7142.3785-71.0716
282education52653.1842.3788-71.0679
283education51974.3442.3789-71.0967
In [24]:
ix.df
Out[24]:
2576×10 DataFrame
2551 rows omitted
Rowelemtypeelemidnodeidlatlonkeyvalueclasspointsrange
SymbolInt64Int64Float64Float64StringStringStringInt64Int64
1node694808146948081442.357-71.0588public_transportstop_positiontransport5300
2node694821886948218842.3599-71.06public_transportstop_positiontransport5300
3node694829936948299342.3525-71.0549public_transportstop_positiontransport5300
4node694874236948742342.3736-71.0697railwaystationtransport10700
5node694874406948744042.3654-71.1037public_transportstop_positiontransport5300
6node694881916948819142.3612-71.0713public_transportstop_positiontransport5300
7node694888396948883942.3568-71.0633public_transportstop_positiontransport5300
8node694902846949028442.3652-71.0603public_transportstop_positiontransport5300
9node694909546949095442.3518-71.0627public_transportstop_positiontransport5300
10node694938106949381042.3534-71.0644public_transportstop_positiontransport5300
11node695046086950460842.3588-71.0578railwaystationtransport10700
12node695049586950495842.3667-71.0678public_transportstop_positiontransport5300
13node695053656950536542.3522-71.0626railwaystationtransport10700
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
2565relation1007674930042415242.358-71.0987leisuretrackleisure5800
2566relation1120144832717108742.3575-71.0611amenityuniversityeducation2010000
2567relation11201473132583865842.3538-71.0623amenitycollegeeducation2010000
2568relation1160116132845838042.3691-71.0722amenitycollegeeducation2010000
2569relation12661390262346583042.3666-71.0831amenityparkingparking5250
2570relation13405301418182949042.3665-71.0951leisureparkleisure5500
2571relation1420509032540320042.3802-71.0965amenitybankshopping1750
2572relation1420509132706737242.38-71.0963shopsupermarketshopping5500
2573relation1420509632752405542.3798-71.0943amenityrestaurantrestaurants5750
2574relation14205406978410900042.38-71.0934amenityparkingparking5250
2575relation1420540832717596942.38-71.0956amenityparkingparking5250
2576relation157048641080001256842.3551-71.1022leisuregardenleisure5500
In [35]:
df = deepcopy(ix.df)
vals = ENU.(LLA.(df.lat, df.lon),Ref(map_data.bounds) )  #Ref(ix.refLLA)
df.x .= getfield.(vals, :east)
df.y .= getfield.(vals, :north)
df
Out[35]:
2576×12 DataFrame
2551 rows omitted
Rowelemtypeelemidnodeidlatlonkeyvalueclasspointsrangexy
SymbolInt64Int64Float64Float64StringStringStringInt64Int64Float64Float64
1node694808146948081442.357-71.0588public_transportstop_positiontransport53001746.64-1035.44
2node694821886948218842.3599-71.06public_transportstop_positiontransport53001647.18-710.309
3node694829936948299342.3525-71.0549public_transportstop_positiontransport53002066.0-1529.94
4node694874236948742342.3736-71.0697railwaystationtransport10700846.854813.538
5node694874406948744042.3654-71.1037public_transportstop_positiontransport5300-1955.96-100.232
6node694881916948819142.3612-71.0713public_transportstop_positiontransport5300713.655-567.151
7node694888396948883942.3568-71.0633public_transportstop_positiontransport53001378.13-1056.26
8node694902846949028442.3652-71.0603public_transportstop_positiontransport53001619.06-125.945
9node694909546949095442.3518-71.0627public_transportstop_positiontransport53001423.73-1610.27
10node694938106949381042.3534-71.0644public_transportstop_positiontransport53001282.82-1430.66
11node695046086950460842.3588-71.0578railwaystationtransport107001831.26-832.385
12node695049586950495842.3667-71.0678public_transportstop_positiontransport53001007.5645.0045
13node695053656950536542.3522-71.0626railwaystationtransport107001433.02-1564.24
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
2565relation1007674930042415242.358-71.0987leisuretrackleisure5800-1540.54-922.13
2566relation1120144832717108742.3575-71.0611amenityuniversityeducation20100001559.93-977.266
2567relation11201473132583865842.3538-71.0623amenitycollegeeducation20100001454.71-1387.61
2568relation1160116132845838042.3691-71.0722amenitycollegeeducation2010000640.989308.822
2569relation12661390262346583042.3666-71.0831amenityparkingparking5250-254.19729.7408
2570relation13405301418182949042.3665-71.0951leisureparkleisure5500-1247.526.5816
2571relation1420509032540320042.3802-71.0965amenitybankshopping1750-1356.821540.61
2572relation1420509132706737242.38-71.0963shopsupermarketshopping5500-1341.311519.23
2573relation1420509632752405542.3798-71.0943amenityrestaurantrestaurants5750-1175.961503.01
2574relation14205406978410900042.38-71.0934amenityparkingparking5250-1100.231522.57
2575relation1420540832717596942.38-71.0956amenityparkingparking5250-1287.991526.96
2576relation157048641080001256842.3551-71.1022leisuregardenleisure5500-1830.14-1244.56
In [36]:
colrs = distinguishable_colors(length(ix.measures), [RGB(0.1,0.2,0.4)])
class2col =  Dict(ix.measures .=> colrs)
colrs
Out[36]:
No description has been provided for this image
In [37]:
m = flm.Map(tiles = "Stamen Toner")

line =0


function latlon(m::MapData,map_g_point_id::Int64)
    osm_node_ix = m.n[map_g_point_id]
    lla = LLA(m.nodes[osm_node_ix], m.bounds)
    return (lla.lat, lla.lon)
end

for e in SimpleValueGraphs.edges(map_data.g)
    # println(e)
    flm.PolyLine(     (latlon(map_data,e.src), latlon(map_data,e.dst)),
        color="#a4f3a7", weight=2, 
        opacity=0.8).add_to(m)
end

for row in eachrow(df) 
    line += 1
    info = "$(row.class):$(row.key)=$(row.value)"
    
    k = findfirst(==(Symbol(row.class)), ix.measures)
    flm.Circle((row.lat, row.lon), color="#$(hex(colrs[k]))",radius=row.points,
        fill_color="#$(hex(colrs[k]))", fill_opacity=0.06, tooltip=info).add_to(m)
    
end
m.fit_bounds([(minimum(df.lat), minimum(df.lon)), (maximum(df.lat), maximum(df.lon))])
MAP_BOUNDS = [(map_data.bounds.min_y,map_data.bounds.min_x),(map_data.bounds.max_y,map_data.bounds.max_x)]
flm.Rectangle(MAP_BOUNDS, color="blue",weight=2).add_to(m)

m.fit_bounds(MAP_BOUNDS)
m
Out[37]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [28]:
ix.measures
Out[28]:
8-element Vector{Symbol}:
 :education
 :entertainment
 :healthcare
 :leisure
 :parking
 :restaurants
 :shopping
 :transport
In [29]:
p = plotmap(map_data; width=900, height=600)
Out[29]:
In [30]:
using StatsBase


cellsize = 100  # meters
xwidth = maximum(df.x)-minimum(df.x)
ywidth = maximum(df.y)-minimum(df.y)
nrow = round(Int, ywidth / cellsize)
ncol = round(Int, xwidth / cellsize)
#data = Array{Float64}(undef, length(ix.measures),  nrow, ncol);

xmin = minimum(df.x)
ymin = minimum(df.y)

attdf = DataFrame()

for i in 1:nrow
    for j in 1:ncol
        x = xmin + cellsize*(j - 0.5)
        y = ymin + cellsize*(i - 0.5)
        enu = ENU(x,y) # ENU in mapdata coordinates
        lla = LLA(enu, map_data.bounds)
        enu2 = ENU(lla, ix.refLLA) #ENU in sptatial indec coordinates
        att = attractiveness(ix, enu2, +)
        for measu in ix.measures 
            a = att[measu]
            push!(attdf,(;measu, x,y,a))
        end
    end
end

attdf
Out[30]:
172360×4 DataFrame
172335 rows omitted
Rowmeasuxya
SymbolFloat64Float64Float64
1education-4545.5-3980.21138.46
2entertainment-4545.5-3980.210.0
3healthcare-4545.5-3980.210.0
4leisure-4545.5-3980.210.0
5parking-4545.5-3980.210.0
6restaurants-4545.5-3980.210.0
7shopping-4545.5-3980.210.0
8transport-4545.5-3980.210.0
9education-4445.5-3980.21141.379
10entertainment-4445.5-3980.210.0
11healthcare-4445.5-3980.210.0
12leisure-4445.5-3980.210.0
13parking-4445.5-3980.210.0
⋮⋮⋮⋮⋮
172349parking10754.59819.790.0
172350restaurants10754.59819.790.0
172351shopping10754.59819.790.0
172352transport10754.59819.793.04296
172353education10854.59819.790.0
172354entertainment10854.59819.790.0
172355healthcare10854.59819.790.0
172356leisure10854.59819.790.0
172357parking10854.59819.790.0
172358restaurants10854.59819.790.0
172359shopping10854.59819.790.0
172360transport10854.59819.794.47852
In [31]:
scale(col::RGB{Float64}, perc) = RGB(col.r+(1-col.r)*(1-perc), col.g+(1-col.g)*(1-perc), col.b+(1-col.b)*(1-perc))
Out[31]:
scale (generic function with 1 method)
In [32]:
p2 = deepcopy(p)

poiclass = :restaurants

points = attdf[attdf.measu .== poiclass, :]
points.z .=  points.a ./ maximum(points.a)


#p = scatter!(points1, color="#$(hex(colrs[k]))", ratio=1, title=string(i), size=(800,600))
#    for j in 1:length(atts)
#        p = scatter!(pts2[j], color=:darkblue, alpha = 0.05, markersize = atts[j] / 30 )
#    end
scatter!(p2, points.x, points.y;fillalpha=0.5,markershape=:rect, markeralpha=0.42,markerstrokewidth=0, markercolor=scale.(Ref(class2col[poiclass]), points.z))

p2
Out[32]: